Analysis of incorrect POS-tagging in student texts with linguistic errors in German
نویسندگان
چکیده
The electronic learner corpus of student texts in German, the PACT, contains parts-of-speech (POS) tagging. This markup is performed automatically using RFTagger. Since are written by students, they may contain various kinds errors: grammatical, spelling, stylistic, and others. Sentences be formulated incorrectly, without taking into account rules language accepted norms. can affect work programs that process automatic mode, as a result, generate incorrect tagging needs to verified manually. purpose study investigate degree influence errors non-authentic on results part-of-speech Based expert error texts, 11 types were identified tagger quality. For each type error, ten sentences containing an selected from corpus. resulting pool was processed taggers RFTagger TreeTagger. parts speech suggested these compared with determined experts As result comparison, following patterns revealed: mistaken when writing non-declinable form adjective instead declinable; one word separately; absence suffix "-er" possessive adjectives formed geographical names; nouns lowercase letter; verb capital letter. case, article provides analysis forms causes POS-tagging, well differences two taggers. Taking revealed will allow more efficient organization POS-tagging verification German. also useful for developers
منابع مشابه
PoS-tagging Italian texts with CORISTagger
This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the ta...
متن کاملFine-Grained POS Tagging of German Tweets
This paper presents the first work on POS tagging German Twitter data, showing that despite the noisy and often cryptic nature of the data a fine-grained analysis of POS tags on Twitter microtext is feasible. Our CRF-based tagger achieves an accuracy of around 89% when trained on LDA word clusters, features from an automatically created dictionary and additional out-of-domain training data.
متن کاملPOS Tagging for Historical Texts with Sparse Training Data
This paper presents a method for part-ofspeech tagging of historical data and evaluates it on texts from different corpora of historical German (15th–18th century). Spelling normalization is used to preprocess the texts before applying a POS tagger trained on modern German corpora. Using only 250 manually normalized tokens as training data, the tagging accuracy of a manuscript from the 15th cen...
متن کاملanalysis of power in the network society
اندیشمندان و صاحب نظران علوم اجتماعی بر این باورند که مرحله تازه ای در تاریخ جوامع بشری اغاز شده است. ویژگیهای این جامعه نو را می توان پدیده هایی از جمله اقتصاد اطلاعاتی جهانی ، هندسه متغیر شبکه ای، فرهنگ مجاز واقعی ، توسعه حیرت انگیز فناوری های دیجیتال، خدمات پیوسته و نیز فشردگی زمان و مکان برشمرد. از سوی دیگر قدرت به عنوان موضوع اصلی علم سیاست جایگاه مهمی در روابط انسانی دارد، قدرت و بازتولید...
15 صفحه اولcollocation errors in translations of the holy quran
the present study aims at identifying, classifying and analyzing collocation errors made by translators of the holy quran into english.findings indicated that collocationally the most acceptablt translation was done by ivring but the least appropriate one made by pickthall.
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nau?nyj rezul?tat
سال: 2022
ISSN: ['2518-1092']
DOI: https://doi.org/10.18413/2313-8912-2022-8-3-0-6